Detailed molecular and epigenetic characterization of the pig IPEC-J2 and chicken SL-29 cell lines

Summary The pig IPEC-J2 and chicken SL-29 cell lines are of interest because of their untransformed nature and wide use in functional studies. Molecular characterization of these cell lines is important to gain insight into possible molecular aberrations. The aim of this paper is to provide a molecular and epigenetic characterization of the IPEC-J2 and SL-29 cell lines, a cell-line reference for the FAANG community, and future biomedical research. Whole genome sequencing, gene expression, DNA methylation, chromatin accessibility, and ChIP-seq of four histone marks (H3K4me1, H3K4me3, H3K27ac, H3K27me3) and an insulator (CTCF) are used to achieve these aims. Heteroploidy (aneuploidy) of various chromosomes was observed from whole genome sequencing analysis in both cell lines. Furthermore, higher gene expression for genes located on chromosomes with aneuploidy in comparison to diploid chromosomes was observed. Regulatory complexity of gene expression, DNA methylation, and chromatin accessibility was investigated through an integrative approach.


INTRODUCTION
The genome of all eukaryotic species is regulated at the chromosome level, 1 where DNA is packaged in a highly organized structure of DNA and histones. Gene expression is regulated through a network of physical interactions of enhancers, promoters, insulators, epigenetic marks, and chromatin-binding factors, which is responsible for the chromatin accessibility. Epigenetic marks such as DNA methylation, noncoding RNAs, and histone modifications can be investigated to obtain insight into regulation of the epigenome. Some histone modifications are highly informative regarding gene expression and are associated with transcriptional activation, promoters, and enhancers. [2][3][4][5] In addition, DNA methylation is important in identifying gene expression and gene silencing, as methylation and gene expression generally show an inverse correlation. Together these (epi)genetic marks can be used to annotate the functional genomic elements that determine gene expression.
The Functional Annotation of Animal Genomes (FAANG) consortium is a scientific driven community, with the aim of providing the functional annotation (functional maps) specifically for farm and companion animals. 6 Earlier projects in human 7 and model animals (e.g. Mouse ENCODE) provided strategies for using omics data to obtain insights into the functional genome; this is achieved by performing genome-wide analysis focusing on genome expression, regulatory functions, methylation, chromatin accessibility, and modifications, providing insights into the functional genome.
Cell lines provide an interesting model to study the genomic architecture and regulatory genome of species of interest. Cell lines directly derived from tissues or organs of an animal are referred to as primary cell lines. Such cells can then either continue growing indefinitely or die off after a certain number of cell divisions. [8][9][10] Cell lines that can be grown indefinitely (i.e. that have become immortalized) often show cell aneuploidy or heteroploidy, which is most pronounced in cancer cell lines. 10,11 In this study, a pig IPEC-J2 and chicken SL-29 (CRL) cell lines were used. The pig IPEC-J2 cell line is frequently used in e.g. intestinal transport studies due to the uniqueness of the cell line being neither transformed nor tumorigenic in nature. Chicken SL-29 (CRL) is useful for investigation of the substrate of virus propagation, recombinant protein expression, and recombinant virus production. 12 Characterizing commonly used cell lines holds value Pig IPEC-J2 cell line Chromosomal abnormalities and variations within the genome The whole genome sequence data were investigated in different ways to determine the structure of the genome. Chromosomal abnormalities such as aneuploidy and heteroploidy can occur in cell lines that grow indefinitely. We therefore first investigated the chromosomal structures and possible changes in aneuploidy. Aneuploidy events can be detected using whole genome sequencing (WGS) data by examining both the read-depth and the ratio of reads that support the alternative allele for heterozygous SNPs; this should be around 0.5 for diploid chromosomes ( Figure 1D), whereas e.g. around 0.33 or 0.67 Gene expression profile RNA-seq data provide insight into gene expression levels across the genome. These data can provide insight into elements that regulate gene expression like promotors, enhancers, and epigenetic marks, as well as chromosomal abnormalities affecting them. Of the 31,907 genes tested, 10,412 were expressed (transcript per million [TPM] >1).
Interestingly, the expression levels on chromosomes 2, 5, and 17 (all chromosomes with ploidy aberration or large structural changes, Figure 3) were higher compared with the diploid chromosomes. Gene expression levels per chromosome in the IPEC-J2 cell line were compared with the gene expression levels for jejunum tissue and jejunum-derived organoids and IPEC-J2 samples cultured for a longer time (van der Hee et al. 13 ; related to Figure S6). The gene expression in IPEC-J2 cells seems to be more variable between chromosomes when compared with gene expression in jejunum tissue and the organoids derived of the jejunum tissue (Table S3, supplemental text S1). A clear elevated gene expression is detected on chromosome 17 in the IPEC-J2 cells compared with tissue and organoids, and a notably high gene expression is observed on chromosome 2 for all samples.

Chromatin accessibility and genome architecture
Genome-wide profiling of histone modifications provides insight into chromatin structure, as well as the location of regulatory elements such as promoter and enhancer regions. Therefore, ChIP-seq data were generated and analyzed for four different histone modifications (Table 1) and the insulator CTCF to provide insight into the regulatory genome of the IPEC-J2 cell line. Furthermore, to assess the conformity between ChIP-seq experiments, ChIP-seq for three marks was repeated in another laboratory. The results of peak calling from the two respective experiments are shown in Table 1.
A high number of peaks were shared between the two experiments (70%-95%), with histone mark H3K4me3 sharing the lowest number of peaks most likely resulting from the lower number of reads for this mark in the ll OPEN  In ChromHMM analyses, the active promoter state is identified by an enrichment of histone marks H3K4me3, H3K27ac, and H3K4me1 ( Figure 4). The two enhancer states show an enrichment in H3K4me1 and H3K27ac. The histone mark H3K27me3 indicates the gene silencing state. Most of the genome is in a quiescent state (without any of the assayed histone marks; Figure 4B right panel), whereas a small fraction is in the weakly repressive state. Both the TSS as well as TSS +/À 2 kb show enrichment in promoter and enhancer states ( Figure 4B, right panel). The CpG islands show strong enrichment for the repressive state   Figure S8 for read alignments in this region). Further downstream of these genes, enrichment of the H3K4me3 mark can be observed in a gene-poor region (gene desert). Annotation of the peak regions identified for each histone modification provides insight into the types of functional elements close to the histone modifications ( Figure S9). The annotation of these peak regions shows distribution of histone modification peaks in different genomic regions, which confirm the enrichment of H3K4me3 and H3K27ac around the promoter region.
Overrepresented sequence motif analysis for histone marks associated with promoter and enhancer regions identifies types of transcriptions factors (TF) related to gene expression. The peaks called for H3K4me3, H3K27ac, and H3K4me1 were provided to Homer for identification of enriched TF involved in gene expression in IPEC-J2. Table 2 shows the three most enriched motifs and their corresponding TF identified for three histone marks associated with promoter and enhancer regions. Other motifs enriched (motifs with p values <1e-12) are shown in supplementary results (Data S1) The significant transcription factors identified were Sp100 and IRF for the peak regions of H3K4me3, E2F and ZNF449 for H3K27ac, and FOS and TEAD for H3K4me1. We identified 36,638 enhancer regions, and motif analysis for these enhancers is shown in Data S1. Significant TFs identified within the enhancer regions include Fra1, TEAD3, EWS-ERG fusion, and FOXN3, which are responsible for cell  iScience Article growth, tumor suppression, and suppression of transcription of transforming growth factor and play a role in several cancers. The CTCF motif generated using both MEME and homer tools ( Figure S10) shows high similarity with the human consensus sequence supporting the conserved nature of CTCFbinding sites. iScience Article DNA methylation profile of the genome Gene expression is negatively correlated with DNA methylation; therefore, investigating the methylome provides further insight into specific characteristics of the cell line's genome characteristics. The methylome was investigated using both reduced representation bisulfite sequencing (RRBS) and whole genome bisulfite sequencing (WGBS) ( Table 3).
Most DNA methylation is observed at CpG sites (42.76%-45.6%, Table 3), whereas at non-CpG sites little DNA methylation is detected (0.68%-1.13%). Therefore, as expected, the cell line displays a methylation pattern similar to what has been observed in porcine tissue methylation studies. 14 The average coverage for the RRBS methylation of chromosomes is 14.2, and most chromosomes are similarly covered  iScience Article ( Figure S11), except for chromosome 17, which has a higher read coverage of 22.4, confirming the triploid status of this chromosome also supported by the RRBS data. A global view of WGBS CpG methylation levels per chromosome shows methylation levels to fluctuate between 0.3 and 0.6 ( Figure S12).
WGBS is referred to as the gold standard for investigating DNA methylation, in particular because it provides more information on a whole genome level compared with RRBS. RRBS is usually perceived as being more cost-effective, as a reduced number of sites are sequenced (usually more focus on CpG regions). We therefore investigated whether all sites identified by RRBS are also covered by WGBS. To do so, we disregarded any sites with a coverage <10 for both types of data as not being sufficiently informative. In total, 23,952 sites covered by RRBS are not covered by WGBS. Further details on possible functional relevance of these RRBS-specific sites is done for the chicken cell line (see later discussion).

Integrative insight into epigenome marks
The interactions between regulatory elements and methylation play a critical role in gene expression. We studied these complex interactions and the relation to gene expression using an integrative approach. The individual relationships between the regulation of gene expression and histone modifications (H3K4me1, H3K4me3, H3K27ac, H3K27me3) and CTCF are shown in Figure 5A. The histone modifications associated with promoter and enhancer regions (H3K4me3, H3K4me1, and H3K27ac) show patterns as expected with a positive correlation with gene expression. Histone mark H3K27me3, which is associated with gene suppression, shows a negative correlation with gene expression ( Figure 5B). As expected, there is a negative correlation for the methylation data at the TSS with the promoter and enhancer regions (H3K4me3 and H3K27ac). The methylation levels within the gene-body (GB) show both a weaker negative correlation to CTCF and H3K27me3 and a positive correlation (WGBS_GB) with H3K4me3 and H3K27ac. A positive correlation is also observed between gene expression and the histone marks H3K4me3 and H3K27ac.
The methylation levels at the TSS and within the gene-body for RRBS and WGBS ( Figure 6A) follow the expectation that methylation levels are negatively correlated with gene expression (i.e. highly expressed genes show lower methylation levels and vice versa) ( Figure 6B). This relationship between methylation and expression values is especially evident at the TSS. Low methylation of RRBS within the gene-body can likely be explained by a lack of coverage in the gene-body compared with WGBS. In addition, for WGBS the methylation level in the gene-body increases at higher gene expression.  Figure 8A). Several large copy-number variants such as deletions and duplications of more than 1Kbp were detected using CNVnator, ( Figure 8B).
Relative abundance of copy-number variants is higher for some of the small micro-chromosomes (i.e. chromosomes 16,25,30,31,32,33), whereas the macro-chromosomes have relatively fewer copy-number variants. Intron variants (48% of total variants) and intergenic variants (28% of total variants) are the most abundant consequences from the copy-number variants in this cell line. The effects, as determined by VEP, of both copy-number variants and large structural variants identified through CNVnator are shown in Figures S13 and S14. The expression profile for this cell line was investigated to obtain further insight into the genes expressed and interaction of regulatory elements, aneuploidy, and CNVs affecting gene expression. We tested the expression of 24,356 genes, of which 13,546 were expressed (TPM > 1).
Higher levels of gene expression are observed on chromosomes 20, 25, 27, and 33 ( Figure 9, supplemental text S4). This shows the influence of aneuploidy and structural variation on gene expression levels, e.g. with tetraploid chromosome 20 showing a higher expression level compared with the diploid chromosomes (e.g. chr 1).

Chromatin accessibility and genome architecture
As described for pigs, histone modifications are of importance to investigate chromatin accessibility and provide further insight into regulatory elements. The peak calling results for the chicken cell line are shown in Table 4. The peaks called show much overlap between the replicates for H3K4me1, H3K4me3, and H3K27ac. The two ChIP-seq experiments were compared using Pearson correlations (Table 4). High correlations between the read coverage of overlapping regions are observed, together with low correlations between the signal values from overlapping peaks for H3K27ac and H3K4me1. H3K4me3 shows a higher correlation of signal values of overlapping peaks.
Various chromatin states were identified, through identification of presence or absence of histone marks using ChromHMM, which provides insight into interactions between different histone marks. Histone marks H3K4me3 and H3K27ac are enriched around the TSS of expressed genes, with H3K4me3 enriched at approximately 20% of the TSS and H3K27ac enriched at >5% of the TSS ( Figure S15). The chromatin dynamics ( Figure 10A) displays states 2 and 3 as repressed states due to the presence of H3K27me3. States 4, 5, and 6 display active dynamics because these are associated with the presence of H3K27ac/H3K4me1, H3K27ac/H3K4me3, and H3K4me1/H3K4me3.The states identified around 2 kb of the TSS are states 4, 5, and 6, with states 5 and 6 showing a very strong enrichment. State 6 is enriched within exons, whereas states 1, 4, 5, and 6 are enriched within genes. State 6 also shows enrichment within CpG islands. A strong enrichment of H3K4me1 and H3K27ac is seen for the expressed gene FKBP5 ( Figure 10B); enrichments of H3K27me3 are seen around the lowly expressed genes (MICAL1 and TULP1).
Annotation of the peak regions to genomic features provides insight into functional elements related to the histone modifications. As for the histone/CTCF marks in the pig cell line, most marks are found in intron and intergenic regions of the genome. Furthermore, a large percentage (16%-21%) of the H3K4me3, CTCF, and H3K27ac histone modifications are found within promoter regions ( Figure S16), and 11% of H3K4me3 has been identified in exon regions of the genome.  iScience Article Significant consensus sequences for known motifs identified the TFs IRF1, ZNF711, YY2, AP-2alpha, Atf3, and Pax8 for the three histone marks (H3K4me3, H3K4me1, and H3K27ac) ( Table 5). In the H3K4me1 consensus sequence of the peak region, a core promoter factor PSE (SNAPc) is observed. In this cell line, a total of 18,516 enhancer regions were identified, and the motif analysis is shown in Data S2. Significant TFs identified within the enhancer regions include SMAD2:SMAD3, EWS-ERG fusion, TWIST1, and TEAD3, which play important roles in regulation of transcription in transforming growth factors and embryonic development and are associated with cancers. The motif sequence for CTCF identified with both homer and MEME ( Figure S17) is similar to the human consensus sequence, supporting the conservation of the CTCF-binding site beyond mammals.

Genome wide chromatin accessibility
Chromatin accessibility was profiled in the chicken SL-29 cell line using ATAC-seq data, from which 86,983 peak regions were identified. To infer the functional significance of accessible regions that were identified, consensus peaks were characterized by genomic localization. Annotation of ATAC-seq showed most accessible (open) chromatin is found in the intron and intergenic regions of the genome, ( Figure S18), iScience Article with 12% of accessible chromatin found at promoters (as defined by TSS location). To interrogate the potential function of accessible regions (peaks), they were subjected to a consensus motif enrichment analysis.
Overall, consensus peaks identified for PAX6 recognition sites as most significant, with about 1.74% of accessible regions harboring this consensus motif ( Table 6). Roles of the TFs identified are as expected related to this cell line, e.g. PRDM1 TF, which is involved in immunity, and PRDM15, which regulates transcription of WNT, and TFs involved in the MAPK-ERK signaling pathway that is related to pluripotency of a cell.

DNA methylation profile
We also determined the average methylation levels for cytosine for the chicken cell line, calculated from both RRBS and WGBS data ( Table 7).
As expected, average methylation (>10 reads) is observed at CpG sites (37%-59%), and slightly higher average non-CpG methylation (1%-2%) is observed. Average chromosome level methylation levels estimated from WGBS fluctuate around 0.2-0.6 ( Figure S19). The average read coverage calculated for the WGBS data was 55.1, with chromosome 20 having a very high ($95) read coverage ( Figure S20).   iScience Article ChIP-seq data). Examples of such regions (chromosome 1 and 2) covered by only RRBS data including predicted active enhancers/promoters are shown in Figure 11.
The transcription factor motif analysis of identified enhancers that overlap with the identified specific RRBS regions is shown in Data S2. One of these motifs, for transcription factor YY2, has a strong CpG consensus sequence, suggesting that these regions only covered by RRBS may include important regions involved in regulation of gene expression. iScience Article

Integrative insight into epigenome marks
An integrative approach was used to gain insight into the dynamics of methylation and histone modifications for regulation of gene expression ( Figure 12). As expected, methylation levels are negatively correlated with gene expression (i.e. highly expressed genes show lower methylation levels and vice versa) at the TSS. Within the gene-body we observe a slight increase in methylation levels with WGBS followed by a decrease with higher expression levels (20 > TPM < 100). The low methylation seen for RRBS within the gene-body can be explained by a lack of coverage in the gene-body compared with WGBS. Heatmap correlations reflect the results observed in the boxplots, with negative correlations between methylation levels both at the TSS and within the gene-body and gene expression.
The correlations between the 4 histone marks (H3K4me1, H3K4me3, H3K27ac, and H3K27me3), ATAC-seq, and distinct classes of gene expression levels are visualized in Figure 13. Enhancer histone marks, H3K27ac and H3K4me3, together with ATAC-seq show an increase in peak score for genes with a higher gene expression level. H3K4me1 and H3K27me3 show little variation in peak scores across different classes of gene expression. For H3K27me3, slightly lower peak scores were observed for very highly expressed genes, whereas higher peak score were seen for very lowly expressed genes.
The positive correlation observed for H3K4me3 and H3K27ac with the gene expression ( Figure 13B) is higher compared with the methylation and gene expression results presented in Figure 12B. Negative iScience Article correlations are observed between methylation levels at both the TSS and gene-body for RRBS and WGBS with promoter and enhancer marks (H3K4me3, H3K27ac), as well as with ATAC-seq. Low correlations between the histone marks H3K4me1 and H3K27me3 with methylation are observed. An example of a region with genes and all epigenomic modifications is shown is Figure S21.

DISCUSSION
Cell lines provide an ethical approach for research in animal production, and thus, molecular characterizations are necessary for functionally relevant cell lines. The pig IPEC-J2 and chicken SL-29 cell lines have never been characterized using an integrative genomics approach. These two cell lines were chosen specifically for their usability and application to research in animal production as well as biomedical research. Using different omics data (WGS, RNA-seq, ChIP-seq, ATAC-seq, RRBS, WGBS), the genome structure, transcriptome, methylome, and chromatin accessibility were investigated and characterized for these cell lines; this provides a reference of the genome architecture of these cell lines for future functional studies using these cell lines as well as for farm animal research community.
Results for both pig and chicken cell lines show that aneuploidy is common in both cell lines, as we observed various chromosomes that were either (partly) monoploid, triploid, and even tetraploid. More aneuploidy as well as more structural variants were observed for the chicken SL-29 cell line in comparison with the pig IPEC-J2 cell line. For SV this could be due to the additional filtering of common structural iScience Article variants from the pig cell line. Previous studies have shown that more chromosomal abnormalities, as well as structural variants occur when cell lines are maintained in culture for a longer time (more passages), 15,16 emphasizing the importance of limiting the number of cell passages for cell line experiments. It has also been suggested that the culture conditions can influence the chromosomal stability. 17 Conditions such as techniques for cell detachment and disaggregation and oxygen concentration during culture can also affect the chromosomal stability and genomic integrity over a longer period of culture. 18 To ensure that conditions do not influence the genomic integrity, precisely defined protocols for cell culture should be followed as much as possible.
The increase in ploidy leads to an increase in expression of the genes on these chromosomes, likely affecting functional relevant aspects of these cell lines. A comparison of the results from the pig cell line with similar tissue type and organoid samples showed pronounced higher rates of expression of the genes on the triploid chromosome 17 of the pig cell line. From this comparison it is evident that these cell lines show higher rates of gene expression on all chromosomes, followed by organoids, and with tissue showing the lowest rates of gene expression; this is in agreement with studies showing that organoids resemble gene expression levels and physiology of tissue more closely than cell lines. 19,13 It has also been shown that aneuploidies and structural variants can influence gene expression level; specifically, structural variants can cause changes in cis-regulatory elements, promoters, and enhancers. [20][21][22] These expression observations in cell lines provide a useful resource for studies where potential genes of interest can be identified and investigated for increased expression levels.
The results from the WGS and RNA-seq show the potential of using these assays to detect chromosomal abnormalities, in addition to investigation of variation in the genome 23,24 and gene expression, respectively. Traditional methods such as karyotyping and staining (multifluor-fluorescence in-situ hybridization) are limited to detection of chromosomal abnormalities, specifically chromosomal rearrangements such as translocations, and provide little insight into genome variation. 25 WGS as a tool for detection of pre-and postnatal anomalies is investigated and implemented more regularly. 26,27 An example of using NGS (nextgeneration sequencing) for detection of prenatal anomalies is discussed in Guseh 2020 28 , where trisomy 21 is detected when a higher proportion of DNA fragments are mapped to chromosome 21 in comparison to a Ref. 28; this shows the potential of WGS for detection of chromosomal abnormalities.
Chromatin accessibility and histone modifications were investigated to gain further insight into the genome architecture of both cell lines using ChIP-seq data for CTCF and four histone modifications: H3K4me1, H3K4me3, H3K27ac, H3K27me3. The importance of using a standardized protocol for comparative ChIP-seq studies was explored using two technical replicates of the ChIP-seq experiments for three histone marks, performed in two different laboratories and using different inputs as a background (immunoglobulin G [IgG] and DNA, respectively); this provides insight into the reproducibility of results between different laboratories and the use of DNA or IgG as an input. We conclude there is a high consensus between overlapping peaks of the experiments, and secondly, the read coverage between experiments showed medium to high consensus between experiments. However, signal values show little correlation between experiments, suggesting that the confidence related to the high number of overlapping peaks is limited and should therefore be used with care if the signal values are used for comparative analysis. A reason for this low correlation of signal values could be the differences in background signal in the two experiments. ChIP-seq guidelines and practices from (mod)ENCODE have found that an IgG control mimics a ChIP experiment more closely than a DNA input control. Cases of strong sonication bias are rarely observed, but this can potentially affect peak calling. 29 Thus, for comparative studies utilizing ChIPsequencing, similar protocol should be followed.
Peak regions identify possible binding sites of proteins associated with DNA (protein-chromatin interactions) and provide insight into regulatory regions and elements. The number of histone marks identified are different between tissues, stages of development, and number of reads sequenced. ENCODE standards have shown that the number of peaks that can be identified ranges from thousands to tens of thousands of peaks. 29,30 Coincidentally the number of peak regions called for narrow peaks (H3K4me3 and H3K27ac) for both pig and chicken cell lines was similar to that in previous studies in monogastric species. 31,32 We confirmed the quality of the data by investigating the occurrences of the respective marks around 2 kb of the TSS. We observed an elevation of the marks H3K4me3 and H3K27ac around the TSS. It has been observed that H3K4me3 is most often found at TSS (regardless of H3K27ac) and indicative of iScience Article a promoter, as seen in this study. H3K4me3 is often co-occupied by H3K27ac in the genome [33][34][35] ; however, H3K27ac is not always found to be co-occupied by H3K4me3 and is also observed further from the TSS site. H3K4me1 and H2K27ac are indicative for enhancer regions. 33 The number of broad peak regions of H3K27me3 for both cell lines is similar to that observed in other vertebrate species, with variations between tissues. 36 This mark is associated with gene silencing, as shown in chromatin state analyses. 34,35 Combinations of histone modifications result in variable chromatin structures, leading to different levels of transcription, which is also reflected in the integrative analysis.
To evaluate the quality of the experiments and the success thereof, we compared the consensus motif for the CTCF sites identified for pig and chicken with the human CTCF consensus sequence. There was good similarity between the consensus motifs for both pig and chicken and human. Furthermore, this motif was also identified in the human K562 cell line as a CTCF mark. CTCF is a highly conserved protein in mammals (between pig and mouse) as well as in vertebrates, [37][38][39] which is indeed confirmed by our CTCF results, supporting the good quality of our results for CTCF.
Further investigation into motifs identified for the histone marks in both cell lines, together with comparison of similarities thereof to known motifs, provides insight into possible TFs. Firstly, interferon receptor factor (IRF) was identified in both cell lines (pig IRF2 and chicken IRF1). IRF is part of a TF family that is found in humans as super enhancer TFs and is highly conserved within species. 40 This TF plays a role in immunity, cell growth, differentiation, and anti-tumor defences in vertebrates. 41-43 TFs such as YY2, TEAD, and E2F, which regulate cell growth and proliferation as well as development were also identified. 44,45 Open chromatin was investigated using ATAC-seq data for the chicken cell line with a number of regions identified, slightly lower than expected when following ENCODE recommendations, which suggests >100,000 peaks. However, this is similar to the number of enriched regions identified by other studies in animals. 46,47 Most of the accessible chromatin was identified within intronic, intergenic, and promoter regions, which is in line with previous research in multiple species and suggests similar patterns between cell lines and tissues. 46,47 Further insight into gene expression and the molecular characteristics of both cell lines was obtained through characterization of the methylome, which ensures comprehensive characterization of the functional genome. DNA methylation is an epigenetic mark that is found in most species and has been found to be inherited and influenced by environmental factors (and often used for comparative analysis and fundamental research into e.g. adaptation). 48,49 In both cell lines a higher coverage in the methylation data (RRBS and WGBS) was observed on aneuploid chromosomes, relating to the change in copy number that is reflected in the methylome as well as on a whole genome level. Methylated cytosines occur primarily at CpG sites in most cell types, whereas non-CpG methylation (CHG, CHH) occurs only in specific cell types such as brain, oocytes, and stem cells 50 ; this is in agreement with our observation in our cell lines, where CpG methylation is the primary type of methylation. In addition, the non-CpG methylation levels in the chicken cell line are slightly higher than observed in tissues of birds, excluding brain tissue. 51,52 This occurrence could be due to the chromosomal aberrations and higher ploidy observed and the cell lines behaving in a tumorigenic manner. Ploidy effects on DNA methylation (epigenome) have been theorized in studies on plants 53,54 and tumors. 55,56 We observed regions within the genome covered by RRBS that were not identified by WGBS or had a low coverage (<10 3) by WGBS. Some of these regions are highly relevant, as they are located at the promoter of specific genes (close to the start site of the gene) or overlapping with enhancer elements. It is noteworthy that the coverage of the WGBS data is high (>503) and therefore should theoretically cover all of the genome, especially at informative sites. 57 This is a relevant observation, as WGBS is often seen as the gold standard for investigating the methylome, as it is supposed to cover almost all sites in contrast to RRBS, which is seen as a cost-effective alternative method. RRBS usually shows reduced coverage of methylated sites in intergenic and distal regulatory regions, especially in comparison with WGBS 48 but our study suggests that RRBS is complementary to WGBS, and to obtain the most comprehensive genome-wide estimation of DNA methylation the two methods should be combined. iScience Article gene-body in the pig cell line increases slightly with higher expression levels, whereas the methylation in the gene-body of the chicken cell line decreases slightly with increased expression levels; this is similar to what has been observed within the gene-body in earlier studies of the methylome in pig and avian species. 14,51 This phenomenon has not been studied extensively; however, Derks et al. 51 suggested possible explanations such as methylation suppressing the transposable elements (TE) and preventing TE insertions, which can be interruptive in the genome. A reason for this could be the higher number of TE in mammals compared with birds, which require higher methylation levels in gene bodies. As expected, the histone modification H3K27me3, which is associated with gene silencing, shows a strong positive correlation to lowly expressed genes and a negative correlation to highly expressed genes. Histone modifications associated with promoter and enhancer regions all show a positive correlation to highly expressed genes and a negative correlation to lowly expressed genes; this confirms previous studies regarding these epigenetic marks regulating gene expression levels.

Conclusion
This paper is the first to describe the molecular characteristics (structure) of the pig IPEC-J2 and chicken SL-29 cell lines. The genomic approaches provided an insight into the different levels of the epigenome influencing gene expression in these cell lines, as well as provided a description of the architecture of the epigenome. Chromosomal abnormalities, copy-number variations, and aneuploidy, typical for a cell line, were identified for several chromosomes for both cell lines. These cell lines are referred to as nontumorigenic and nontransformed; however, as these cells go through many passages, aneuploidy events do occur. Future researchers should note the characteristics of these cell lines and proceed with caution for interpretation of results. Epigenetic marks such as histone modifications, chromatin accessibility, and DNA methylation were integrated with expression data for both cell lines; this provided insight into the interactions between the epigenetic marks and gene expression. The characteristics as described in this paper for these cell lines will be similar for cells cultured using the same protocol and cells grown for the same number of passages. Deviations from these guidelines/methodologies are expected to result in different genomic and epigenomic characteristics. Understanding these cell lines and the (epi)genetic makeup thereof can provide a better understanding of the limitations of these cell lines as a model for in vivo research. We propose these cell lines as a reference for future functional and comparative studies in animals, whereby knowledge of ploidy, expression profile, chromatin landscape, and methylome provide the backbone for the comparison.

Limitations of the study
The ATAC-seq data were unavailable for the pig IPEC-J2 cell line. Further investigation using traditional methods for confirmation chromosomal abnormalities e.g. karyotypes staining would be beneficial.

STAR+METHODS
Detailed methods are provided in the online version of this paper and include the following:

Lead contact
Further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, Jani de Vos (jani.devos@wur.nl).

Materials availability
This study did not generate new unique reagents.

Data and code availability
All data types (WGS, RNA-seq, WGBS, RRBS, ChIP-seq -H3K4me1, H3K4me3, H3K27ac, H3K27me3, CTCF and ATAC-seq) from pig IPECJ2 (PRJEB59474) and chicken CRL (PRJEB59475) cell lines have been deposited at ENA and are publicly available as of the date of publication. Accession numbers are listed in the key resources table.
All original code has been deposited at Zenodo (https://doi.org/10.5281/zenodo.7274310) and is publicly available as of the date of publication. DOIs are listed in the key resources table.
Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.

EXPERIMENTAL MODEL AND SUBJECT DETAILS
The intestinal epithelial pig IPECJ2 (ACC-701) cell line was obtained from the cell repository at DSMZ, 87 which is an intestinal columnar epithelial cell line derived from the mid-jejunum of a neonatal unsuckled female piglet (piglets less than 12 hours old). These cells were originally isolated in 1989 by Helen Berschneider at the University of North Carolina. 88 For chicken the SL-29 cell line (ATCC CRL-1590), was derived from embryonic fibroblast cells obtained from the cell repository at ATCC. 89 Cells were cultured at 37 C and 5% CO2 in Dulbecco's MEM with 5% FBS, Pen/Strep and Glutamax using a standard FAANG operating procedure. The media was refreshed twice a week and progressing to the next passage mostly 1/20 of the cells were transferred to a new flask. These cells were cultured for 4 passages in chicken SL-29 and 67 passages in pig, before harvesting.

Sequencing and assays
These cells were then used for whole genome sequencing (WGS), RNA sequencing, reduced representation bisulphite sequencing (RRBS), whole genome bisulphite sequencing (WGBS), ChIP-seq and ATACseq. DNA and RNA were isolated from the cell lines using the All Prep DNA/RNA Mini Kit (Qiagen) following manufacturer's instructions. WGS libraries of $ 300-400 bp fragments were prepared using Illumina paired-end kits (Illumina, San Diego, CA) and 150 bp paired-end sequenced with Illumina HiSeqX. RNA-seq library preparation and sequencing was done as described in van der Hee et al. 13

Data analysis
Pig (Sus Scrofa 11.1) and chicken (Gallus gallus GRCg6a) reference genomes, together with ENSEMBL annotations (Sus Scrofa 11.1 -release 103 & Gallus gallus GRCg6a -release 103) were utilized for all data analyses of our study. Default settings were used unless otherwise stated, and a brief overview of the data analyses is shown in Figure S1. Genome indexes were built using the required reference genomes (and annotations where required) with the tools described below. Quality of all datasets was evaluated, and the statistics thereof is shown in Tables S1 and S2.  60 The aligned reads were further processed using samtools (v1.9) 61 to fill in mate coordinates, as well as add requirements from mate related flags. Mapping quality was evaluated using Qualimap (v2.2.1) 62 to ensure correct and accurate mapping.
Read-depth, genome structure, and possible large structural variants were evaluated using the tinycov package (v0.3.0) 91 and SNV calling was done using FreeBayes (v1.3.1). 63 Read support (ratio) was evaluated for heterozygous variants within the VCF file using a custom unix script 86 and the results were plotted as histograms. The read-depth and SNV's were then plotted using Circos (v-0.69-9) 64 for the visualization at a whole genome level, as well as for specific regions of interest.
Structural variant analysis of both cell lines was completed using Manta (v1.4.0). 65 For the pig cell line these variants were filtered in the following way: structural variants from healthy pig tissue samples with similar high read depth (in house samples: two muscle and one liver) were identified using Manta and overlapping structural variants between cell-lines and tissues were filtered out from the cell line structural variants. This strategy was used to exclude naturally occurring structural variants not unique to the cell line. For chicken Manta analysis this strategy was not possible due to lack of WGS chicken data with sufficient high coverage. Large SVs (deletions and duplications) were investigated using CNVnator (v0.3.3), 66 and results verified using a genome browser. 92 Variant Effect Predictor 67 was used to determine the consequences of all copy number and structural variants on the genomes.

RNA-sequencing analysis
Stranded RNA-seq datasets for both cell lines (pig IPECJ2 and chicken SL-29) were trimmed for adapters, quality and minimum length using TrimGalore v0.6.4 a wrapper for Cutadapt v1.18 93 and the sequence data quality was evaluated using FastQC v0.11.9. 68 The trimmed reads were used for alignment and gene quantification using RSEM, 69 with STAR v2.7.3a as aligner. 70 Further analyses were completed using custom shell scripts for basic statistics and average gene expression level per chromosome was calculated and plotted using a custom python script with the Seaborn package. 86 Various minimum transcript per kilobase million (TPM) thresholds were implemented for different analyses to reduce noise of uninformative genes that are very low expressed.
Additional raw RNA-sequencing data was downloaded from ENA from the PRJNA610529 project: two pig jejunum organoid samples grown for different time periods (3 weeks (ENA:SAMN14300031) and 12 weeks (ENA:SAMN14300021), a 5 week old pig jejunum tissue sample (ENA:SAMN14300018), cell lines IPECJ87, an IPECJ2 cell line grown for 87 passages (ENA:SAMN14300016), and IPECJ91, an IPECJ2 cell line grown for 91 passages (ENA:SAMN14299997). We trimmed, aligned, and completed gene quantification of this data following the same procedure as the above procedure used for the IPECJ2 cell line used in this study. These samples were used to compare the average gene expression level per chromosome in jejunum tissue, organoid and IPECJ2 cell line.

ATAC-sequencing analysis
Trimming and alignment of the ATAC-seq reads were completed as described above for ChIP-seq reads. PCR duplicates were removed using picard (v2.23.9), and only unique, paired-end reads were kept for further analysis. 83 Further filtering included removing the mitochondrial (MT) data, as a method of reducing bias in the results. Reads were shifted +4 bp and À5 bp for positive and negative strands respectively using an in-house unix script 86 and this was done to account for the 9 bp duplication that occurs due to DNA repair of Tn5 transposase nick. 95 This shift ensures accurate regions of the chromatin state for TF-footprint and motif related analysis. MACS2 (v2.7.1) ,73,74 was used for peak calling using default parameters. Peak annotation (homer v4.1.0) and motif analysis 78 for the peak regions were obtained, and identified motifs were scanned for known motifs such as TFBS and TATA-box using the Homer tool. 80

Methylation analysis
RRBS and WGBS raw reads were trimmed as described above for ChIP-seq reads, with an additional -rrbs parameter for RRBS data. Genome index was built using BSseeker2 v 2.1.8, 96 with bowtie2 as aligner, for RRBS and WGBS genome, with additional parameters for RRBS: -r -l 10 -u 280. Thereafter the reads were aligned using BSseeker2 (v 2.1.8), with additional parameters for RRBS: -rrbs, -c MspI, -L 10 -U 280 -m 4 and for WGBS: -I 0 -X 1000 -m 4. BSSeeker2 was used for the alignment as this tool is tailored for RRBS as it 'builds' a custom reference based on the restriction enzymes cutting sites. It is also more suitable to align gapped-reads than other tools commonly used for methylation analysis. 96 We decided to keep methylation analysis standard across assays and thus implemented BSSeeker2 for WGBS data as well.
The aligned reads of the biological replicates of the pig cell line were merged for further analysis, after a Pearson correlation 97 showed a high correlation of 0.96 between the two RRBS technical replicate samples ( Figure S2). CGmaptools (v0.1.2) 84 was implemented for the methylation calling from the aligned reads. Further statistical analysis of the methylation data was completed using CGmaptools (v0.1.2) and MethylKit (v1.16.1). 85 Correlations and clustering between the biological replicates were analyzed using MethylKit.
Finally, the number of sites identified by RRBS data and not by WGBS data was investigated using the following approach. Firstly, the methylation calls were filtered for only CG sites and for a coverage of more than 10. Thereafter bedtools (v2.30.0) 82 was used to identify the sites unique to RRBS data, and for merging these regions. Functional importance of these regions was investigated by overlaying regions with enhancer and promoter regions detected from the ChIP-seq analysis, visual examination of the sites using Jbrowse, 92 and motif discovery using homer.

Integrative analysis
An integrative approach was used to investigate the relationship between WGS, expression data (RNAseq), methylation status and ChIP-seq marks. Output files from the homer annotate called peaks for the respective marks, gene expression values from RSEM, and methylation calls from CGmaptools were used. For this investigation, correlations, scatterplots, and boxplots were created from these files using an in-house python script. 86

QUANTIFICATION AND STATISTICAL ANALYSIS
Pearson correlations between the read coverage and signal value of peaks for histone marks H3K4me1, H3K4me3 and H3K27ac of each experiment was done using bedtools (v2.30.0), 82 Figure S2) confirmed that the two samples could be merged for further downstream analysis. Integrative analysis used heatmaps, Pearson correlations, scatterplots and boxplots to investigate the relationships between gene expression, various histone modifications indicating promoters, enhancers and gene silencing (H3K4me1, H3K4me3, H3K27ac, H3K27me3